76 research outputs found
Recommended from our members
Semi-parametric methods of handling missing data in mortal cohorts under non-ignorable missingness.
We propose semi-parametric methods to model cohort data where repeated outcomes may be missing due to death and non-ignorable dropout. Our focus is to obtain inference about the cohort composed of those who are still alive at any time point (partly conditional inference). We propose: i) an inverse probability weighted method that upweights observed subjects to represent subjects who are still alive but are not observed; ii) an outcome regression method that replaces missing outcomes of subjects who are alive with their conditional mean outcomes given past observed data; and iii) an augmented inverse probability method that combines the previous two methods and is double robust against model misspecification. These methods are described for both monotone and non-monotone missing data patterns, and are applied to a cohort of elderly adults from the Health and Retirement Study. Sensitivity analysis to departures from the assumption that missingness at some visit t is independent of the outcome at visit t given past observed data and time of death is used in the data application
Handling missing data in matched case-control studies using multiple imputation.
Analysis of matched case-control studies is often complicated by missing data on covariates. Analysis can be restricted to individuals with complete data, but this is inefficient and may be biased. Multiple imputation (MI) is an efficient and flexible alternative. We describe two MI approaches. The first uses a model for the data on an individual and includes matching variables; the second uses a model for the data on a whole matched set and avoids the need to model the matching variables. Within each approach, we consider three methods: full-conditional specification (FCS), joint model MI using a normal model, and joint model MI using a latent normal model. We show that FCS MI is asymptotically equivalent to joint model MI using a restricted general location model that is compatible with the conditional logistic regression analysis model. The normal and latent normal imputation models are not compatible with this analysis model. All methods allow for multiple partially-observed covariates, non-monotone missingness, and multiple controls per case. They can be easily applied in standard statistical software and valid variance estimates obtained using Rubin's Rules. We compare the methods in a simulation study. The approach of including the matching variables is most efficient. Within each approach, the FCS MI method generally yields the least-biased odds ratio estimates, but normal or latent normal joint model MI is sometimes more efficient. All methods have good confidence interval coverage. Data on colorectal cancer and fibre intake from the EPIC-Norfolk study are used to illustrate the methods, in particular showing how efficiency is gained relative to just using individuals with complete data
Recommended from our members
A latent variable model for improving inference in trials assessing the effect of dose on toxicity and composite efficacy endpoints.
It is often of interest to explore how dose affects the toxicity and efficacy properties of a novel treatment. In oncology, efficacy is often assessed through response, which is defined by a patient having no new tumour lesions and their tumour size shrinking by 30%. Usually response and toxicity are analysed as binary outcomes in early phase trials. Methods have been proposed to improve the efficiency of analysing response by utilising the continuous tumour size information instead of dichotomising it. However, these methods do not allow for toxicity or for different doses. Motivated by a phase II trial testing multiple doses of a treatment against placebo, we propose a latent variable model that can estimate the probability of response and no toxicity (or other related outcomes) for different doses. We assess the confidence interval coverage and efficiency properties of the method, compared to methods that do not use the continuous tumour size, in a simulation study and the real study. The coverage is close to nominal when model assumptions are met, although can be below nominal when the model is misspecified. Compared to methods that treat response as binary, the method has confidence intervals with 30-50% narrower widths. The method adds considerable efficiency but care must be taken that the model assumptions are reasonable
Simulating data from marginal structural models for a survival time outcome
Marginal structural models (MSMs) are often used to estimate causal effects
of treatments on survival time outcomes from observational data when
time-dependent confounding may be present. They can be fitted using, e.g.,
inverse probability of treatment weighting (IPTW). It is important to evaluate
the performance of statistical methods in different scenarios, and simulation
studies are a key tool for such evaluations. In such simulation studies, it is
common to generate data in such a way that the model of interest is correctly
specified, but this is not always straightforward when the model of interest is
for potential outcomes, as is an MSM. Methods have been proposed for simulating
from MSMs for a survival outcome, but these methods impose restrictions on the
data-generating mechanism. Here we propose a method that overcomes these
restrictions. The MSM can be a marginal structural logistic model for a
discrete survival time or a Cox or additive hazards MSM for a continuous
survival time. The hazard of the potential survival time can be conditional on
baseline covariates, and the treatment variable can be discrete or continuous.
We illustrate the use of the proposed simulation algorithm by carrying out a
brief simulation study. This study compares the coverage of confidence
intervals calculated in two different ways for causal effect estimates obtained
by fitting an MSM via IPTW.Comment: 29 pages, 2 figure
Methods for handling longitudinal outcome processes truncated by dropout and death.
Cohort data are often incomplete because some subjects drop out of the study, and inverse probability weighting (IPW), multiple imputation (MI), and linear increments (LI) are methods that deal with such missing data. In cohort studies of ageing, missing data can arise from dropout or death. Methods that do not distinguish between these reasons for missingness typically provide inference about a hypothetical cohort where no one can die (immortal cohort). It has been suggested that inference about the cohort composed of those who are still alive at any time point (partly conditional inference) may be more meaningful. MI, LI, and IPW can all be adapted to provide partly conditional inference. In this article, we clarify and compare the assumptions required by these MI, LI, and IPW methods for partly conditional inference on continuous outcomes. We also propose augmented IPW estimators for making partly conditional inference. These are more efficient than IPW estimators and more robust to model misspecification. Our simulation studies show that the methods give approximately unbiased estimates of partly conditional estimands when their assumptions are met, but may be biased otherwise. We illustrate the application of the missing data methods using data from the 'Origins of Variance in the Old-old' Twin study
Methods for observed-cluster inference when cluster size is informative: a review and clarifications.
Clustered data commonly arise in epidemiology. We assume each cluster member has an outcome Y and covariates X. When there are missing data in Y, the distribution of Y given X in all cluster members ("complete clusters") may be different from the distribution just in members with observed Y ("observed clusters"). Often the former is of interest, but when data are missing because in a fundamental sense Y does not exist (e.g., quality of life for a person who has died), the latter may be more meaningful (quality of life conditional on being alive). Weighted and doubly weighted generalized estimating equations and shared random-effects models have been proposed for observed-cluster inference when cluster size is informative, that is, the distribution of Y given X in observed clusters depends on observed cluster size. We show these methods can be seen as actually giving inference for complete clusters and may not also give observed-cluster inference. This is true even if observed clusters are complete in themselves rather than being the observed part of larger complete clusters: here methods may describe imaginary complete clusters rather than the observed clusters. We show under which conditions shared random-effects models proposed for observed-cluster inference do actually describe members with observed Y. A psoriatic arthritis dataset is used to illustrate the danger of misinterpreting estimates from shared random-effects models.SRS is funded by MRC grants U1052 60558 and MC_US_A030_0015, AJC and MP by MRC grant G0600657
A general method for elicitation, imputation, and sensitivity analysis for incomplete repeated binary data.
We develop and demonstrate methods to perform sensitivity analyses to assess sensitivity to plausible departures from missing at random in incomplete repeated binary outcome data. We use multiple imputation in the not at random fully conditional specification framework, which includes one or more sensitivity parameters (SPs) for each incomplete variable. The use of an online elicitation questionnaire is demonstrated to obtain expert opinion on the SPs, and highest prior density regions are used alongside opinion pooling methods to display credible regions for SPs. We demonstrate that substantive conclusions can be far more sensitive to departures from the missing at random assumption (MAR) when control and intervention nonresponders depart from MAR differently, and show that the correlation of arm specific SPs in expert opinion is particularly important. We illustrate these methods on the iQuit in Practice smoking cessation trial, which compared the impact of a tailored text messaging system versus standard care on smoking cessation. We show that conclusions about the effect of intervention on smoking cessation outcomes at 8 week and 6 months are broadly insensitive to departures from MAR, with conclusions significantly affected only when the differences in behavior between the nonresponders in the two trial arms is larger than expert opinion judges to be realistic
Introduction to Double Robust Methods for Incomplete Data.
Most methods for handling incomplete data can be broadly classified as inverse probability weighting (IPW) strategies or imputation strategies. The former model the occurrence of incomplete data; the latter, the distribution of the missing variables given observed variables in each missingness pattern. Imputation strategies are typically more efficient, but they can involve extrapolation, which is difficult to diagnose and can lead to large bias. Double robust (DR) methods combine the two approaches. They are typically more efficient than IPW and more robust to model misspecification than imputation. We give a formal introduction to DR estimation of the mean of a partially observed variable, before moving to more general incomplete-data scenarios. We review strategies to improve the performance of DR estimators under model misspecification, reveal connections between DR estimators for incomplete data and 'design-consistent' estimators used in sample surveys, and explain the value of double robustness when using flexible data-adaptive methods for IPW or imputation
- …